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Generation of Synthetic Spatially Embedded 

Power Grid Networks 

Saleh Soltan and Gil Zussman 


Abstract —The development of algorithms for enhancing the 
resilience and efficiency of the power grid requires performance 
evaluation with real topologies of power transmission networks. 
However, due to security reasons, such topologies and particularly 
the locations of the substations and the lines are usually not publicly 
available. Therefore, we study the structural properties of the 
North American grids and present an algorithm for generating 
synthetic spatially embedded networks with similar properties to 
a given grid. The algorithm uses the Gaussian Mixture Model 
(GMM) for density estimation of the node positions and generates 
a set of nodes with similar spatial distribution to the nodes in a 
given network. Then, it uses two procedures, which are inspired 
by the historical evolution of the grids, to connect the nodes. The 
algorithm has several tunable parameters that allow generating 
grids similar to any given grid. Particularly, we apply it to the 
Western Interconnection (WI) and to grids that operate under the 
SERC Reliability Corporation (SERC) and the Elorida Reliability 
Coordinating Council (ERCC), and show that it generates grids 
with similar structural and spatial properties to these grids. To 
the best of our knowledge, this is the first attempt to consider 
the spatial distribution of the nodes and lines and its Importance 
in generating synthetic power grids. 

Index Terms —Power Grids, Structural Properties, Synthetic 
Networks, Spatial Networks, Data Mining. 


I. Introduction 

The design of algorithms and methods for enhancing the 
power grid (namely, making it smarter) drew tremendous 
attention over the past decade 0 - These efforts focused 
on challenges stemming from renewable generation intercon¬ 
nection Q, Phasor Measurement Units (PMUs) placement 
0,0, transmission expansion planning Q, and vulnerability 
analysis 0, 0, 0, |T§. The development of algorithms for 
coping with these challenges requires performance evaluation 
with real grid topologies. However, in order to avoid exposing 
vulnerabilities, the topologies of the power transmission net¬ 
works and particularly the locations of the substations and the 
lines are usually not publicly available or are hard to obtain. 

There are only very few and limited test cases and real- 
world power grid datasets that are publicly and freely avail¬ 
able. These include the IEEE test cases O), the National 
Grid UK ig, the Polish grid HD. and an approximate model 
of the European interconnected system GD- To the best of 
our knowledge, among these. National Grid UK is the only 
publicly available dataset with geographical locations. Even 
if the data was available, it would be unwise to publish 
vulnerability results which are based on real topologies, due to 
the enormous cost of grid enhancements. On the other hand, it 
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Eig. 1: The North American Electric Reliability Corpora¬ 
tion (NERC) regional entities and the National Electricity 
Transmission Grid of Mexico (NETGM). Different reliability 
corporations/councils are marked with different colors. 


was recently shown that simple random graph models cannot 
be used to generate grids with appropriate structural and 
spatial characteristics GD (for more details, see Section E- 
Therefore, in this paper we design an algorithm for generating 
synthetic networks with similar structural and spatial proper¬ 
ties to real power grids. Such synthetic networks can be used 
for evaluation of various methods and techniques. 

To demonstrate the algorithm design and to evaluate its 
performance, we focus on the transmission networks of the 
North American and Mexican power grids (see NERC and 
NETGM in Eig. [TJ using data that we obtained from the 
Platts Geographic Information System (GIS) GD- We con¬ 
sider one of the two major interconnections - the Western 
Interconnection (WI) (see Pig. ^ which includes the Western 
Electricity Coordinating Council in the United States (WECC) 
and Canada (WECCC) (see Pig. for their coverage areas). 
Moreover, we consider two regional entities that operate under 
the Eastern Interconnection (El) which is the other major 
interconnection - the SERC Reliability Corporation (SERC), 
which is as large as the WI, and the Elorida Reliability 
Coordinating Council (ERCC), which is much smaller than 
the WI. To the best of our knowledge, this is the first time 
that the entire dataset of the North American and Mexican 
grids as well as those of SERC and FRCC are processed and 
analyze^ 

Por the entire North American and Mexican grid as well as 
for WI, SERC, and ERCC, we consider four metrics that cap¬ 
ture the networks’ structural properties; average path length. 


0 


^Partial analysis of the WI dataset has been conducted before 


see Section 




2 



Fig. 2: The Western Interconnection (WI) power grid with 
14,302 substations (nodes) and 18,769 lines (edges). 


clustering coefficient, degree distribution of the nodes, and the 
length distribution of the lines. The first three metrics are very 
common m), (TT), GD, id, iol, ID, (22). However, to 
the best of our knowledge, the length distributions of the lines 
have not been thoroughly studied before. These distributions 
are particularly important, since the physical properties of a 
line (e.g., admittance and type) are directly correlated with its 
length (23) , and hence, the distributions directly impact the 
performance of various algorithms. 

Motivated by the results of the structural properties’ analy¬ 
sis, we present the Geographical Network Learner and Gener¬ 
ator (GNLG) Algorithm for generating a network with similar 
properties to a given grid. First, using Gaussian Mixture 
Model (GMM), the algorithm estimates the density of the 
node positions and uses the obtained parameters to generate 
a set of nodes with a similar spatial distribution to these 
nodes (the algorithm uses the Bayesian Information Criterion 
(BIC) to find the best number of clusters for the GMM). 
Then, the GNLG Algorithm uses two procedures, which are 
inspired by the historical evolution of power grids, to connect 
the generated nodes. Particularly, since the two main design 
considerations of the grid are connectivity and robustness, the 
algorithm obtains a spanning tree of the nodes to provide 
connectivity and then adds more edges to the network graph 
to increase its robustness. The addition of edges is tuned to 
create a synthetic network with properties that are similar to 
those of a given network. 


To evaluate the performance of the GNLG Algorithm, 
we use it to generate networks similar to the WI, SERC, 
and FRCC. We show that by adapting a number of tunable 
parameters, the GNLG Algorithm can generate synthetic net¬ 
works with similar structural and spatial properties to these 
power grid networks. Overall, we believe that by adapting 
the algorithm’s tunable parameters, it is possible to generate 
synthetic networks similar to any given power grid network. 


This paper is organized as follows. Section [I^ reviews 
related work. Section [111] describes the dataset and the metrics. 


and presents the metrics for the different grids. Section IV 


describes the GNLG Algorithm and Section |V] numerically 
evaluates its performance. We conclude and discuss future 
research directions in Section m 

11. Related Work 

The structural properties of various power grids (e.g., in 
North America, some European countries, and Iran) were 
studied in (TT), HD, (H, Gg, ( 26 ), (23. Most of these 
studies considered one or two properties (e.g., average degree, 
degree distribution, average path length, and clustering coef¬ 
ficient) and computed it in a given power grid. In some cases 
(e.g., (g, ( 13 , (g, |[^, (^, (^, (^) a certain class 
of graphs was suggested as a good representative of a power 
grid network, based on one or two structural properties. Eor 
example. Watts and Strogatz GZ) suggested the small-world 
graph as a good representative, based on the shortest path 
lengths between nodes and the clustering coefficient of the 
nodes. Barabasi and Albert (TS) showed that scale-free graphs 
are better representatives based on the degree distribution. 
However, by comparing the WI with these models, Cotilla- 
Sanchez, et al. GD showed that none of them can represent 
the WI properly. 

More detailed models that are specifically tailored to the 
power grid characteristics were proposed in (2^ , (29) but they 
did not consider the nodes’ spatial distribution and the length 
distribution of the lines. The spatial distribution of the nodes 
is correlated with the length of the lines, and as mentioned 
above, it is important to consider line lengths when designing 
a method for synthetic power grid generation. While there 
are several models for generating spatial networks ( 33 , (^, 
( 33 , most of them were not designed to generate networks 
with properties similar to power grid networks. To the best of 
our knowledge, this paper is the first to consider the spatial 
distribution of the nodes in power grids and its importance in 
generating synthetic networks with similar structural proper¬ 
ties. 

HI. Preliminaries and Structural Properties 

In this section, we study the structural properties of the 
entire North American and Mexican grid (denoted by NA&M) 
as well as of the WI, SERC, and ERCC grids. We obtained 
the data from the Platts GIS G3 and conducted longitude- 
latitude to planar {x, y) coordinate transformation, using the 
great-circle distance method. Since the files containing sub¬ 
stations and files containing lines are not always consistent, 
we extracted the coordinates of the substations from the end 
point coordinates of the lines. We then used the geographical 
coordinates of the substations and the lines to construct the 
graphs with nodes and edges that represent substations and 
lines, respectively. We used the map of reliability coorpo- 
rations/councils boundaries to divide the graph into regional 
entities (as in Pig. [3- To the best of our knowledge, beside ( 3 , 
0 where an approximation of the WI graph was extracted 
from the Platts GIS dataset for simulations, it is the first time 
that this dataset is processed and analyzed. 

In addition to the number of the nodes and edges, we use 
four metrics for classifying the structural properties of these 
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Fig. 3; The degree distribution of the nodes in the NA&M, WI, SERC, and FRCC grids (in log-log scale). Linear regression 
lines with slopes ( = —4.28, ( = —3.48, = —3.93, and ( = —2.76, respectively, are fitted to the tail distribution of the 

degrees. 


TABLE I; Summary of the structural properties of the NA&M, 
WI, SERC, and FRCC grids. 


Network 

NA&M 

WI 

SERC 

FRCC 

Number of Nodes (n) 

55,231 

14,302 

12,946 

1,312 

Number of Edges (m) 

70,088 

18,769 

16,658 

1,780 

Average Path Length (L) 

26.66 

17.33 

19.71 

11.68 

Clustering Coefficient (C) 

0.049 

0.049 

0.049 

0.075 

Degree Distribution ((^) 

-4.28 

-3.48 

-3.93 

-2.76 


at most di{di — l)/2 edges can exist between its neighbors 
N{i). Let Ci denotes the fraction of these allowable edges 
that actually exist; 

^ |{{r,s}|r,s g A^(z),{r,s} € E}\ 

* d,K-l)/2 


networks: average path length, clustering coefficient, degree 
distribution of the nodes, and length distribution of the lines. 
Table H] includes these metrics for the NA&M, WI, SERC, and 
FRCC grids. 

Notation. We denote the WI, SERC, and FRCC power grid 
transmission networks by graphs Gwu Gserc, and Gfrcc, 
respectively. For each network, n and m denote the number 
of the nodes and edges, di denotes the degree of node i and 
Pj g denotes its position. We define p as the average 
Euclidean distance of a node from its N nearest neighbors. 
We use the prime symbol (') to denote the values for a 
generated network (e.g., G'^^i denotes the generated network). 
All the logarithms in this paper are natural logarithms. All the 
geographical distances in this paper are Euclidean distances 
(i.e., ||pj — Pj ||2 is the distance between nodes i and j). 


A. Average path length 


The average path length, denoted by L, is one of the 
common metrics used for classifying graphs. It is defined as 
the number of edges in the shortest path between two nodes, 
averaged over all pairs of vertices; 


L = 


1 


n{n — 1) 


dist(rij), 


i,jev 


where dist(i,_)) is the number of edges in the shortest path 
between nodes i, j. As can be seen in Table the average 
path length in all the four networks is in 0(log(n)) which is 
very small and suggests that these networks have the small- 
world property. 


B. Clustering coefficient 

An important metric is the clustering coefficient, denoted 
by G and defined as follows. For each node i, with degree di 


Then, averaging Gi over all the nodes: C = C'i/n. As 

can be seen in Table |Ij the clustering coefficient for all the 
four networks is very small. 


C. Degree distribution of the nodes 

The degree distribution of the nodes is another important 
metric for classifying graphs (e.g., scale-free networks). Fig.[^ 
shows the degree distribution of the nodes in the NA&M, WI, 
SERC, and FRCC grids in log-log scale. The degree one nodes 
in these networks usually correspond to power plants or small 
towns. These figures may suggest that the tail of the degree 
distribution follows a power-law distribution in all the three 
networks. However, following | [33| and since these networks 
are finite, we do not have enough statistical evidence to support 
the power-law hypothesis. Therefore, we only use the slope (0 
of the fitted linear regression line to the tail distribution for 
comparison purposes. 

In Section we use the Kolmogrov-Smirnov (KS) statis¬ 
tic 134) to compare the degree distribution of the nodes in a 
given network and a generated network. If P{x) and Q{x) 
are two Cumulative Distribution Functions (CDFs), the KS 
statistic between these two is defined as follows: 

Dks = max |P(a;) - Q{x)\. 


D. Length distribution of the lines 

As mentioned above, the length distribution of the lines is 
one of the important parameters that needs to be sustained 
in synthetic power grid generation. Fig. shows the length 
distribution of the lines in the NA&M, WI, SERC, and FRCC 
grids. The length distribution of the lines in the NA&M grid 
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Fig. 4; The distributions of the actual line lengths (in km) in the NA&M, Wl, SERC, and FRCC grids (the lengths’ statistics 
appear in Table |^. Nonparametric distribution fits to the log length distributions are shown in blue. 



Fig. 5; The distribution of the actual line lengths (in km) in 
the NA&M grid in log-log scale. A linear regression line with 
slope —1.61 is fitted to the tail distribution of the lengths. 


TABLE II: Statistics of the actual line lengths in the NA&M, 
WI, SERC, and FRCC grids and of the corresponding straight 
lines (Euclidean distances) between substations in those grids 
(in km). The statistics of the straight lines are shown in the 
grey cells. 


Network 

NA&M 

WI 

SERC 

ERCC 

Mean 

15.46 

16.63 

13.29 

12.82 

14.30 

15.78 

11.39 

9.95 

Standard Deviation 

32.55 

43.91 

22.29 

20.14 

30.68 

40.78 

17.90 

15.6 

Maximum 

1,714.82 

1,380.35 

1,714.82 

1,380.35 

795.44 

409.92 

282.74 

226.25 


Algorithm 1: Geographical Network Learner and Gener¬ 
ator (GNLG) 

Input: G, {Pi}"— 1 , and parameters k, o, /5 ,7 > 0 and At S N. 

1: Generate a set of nodes with similar spatial distribution to the nodes 
in G using the SDNG Procedure (Subsection |IV-A1 
2: Connect the generated nodes using the TWST Procedure 
(Subsection IIV-B) . 

3: Add more edges to the generated gra ph usin g the 
Reinforcement Procedure (Subsection |IV-B1 
4: return the generated graph G'. 


to measure the similarity between the length distribution of 
the lines in a given network and a generated network. The 
KL-divergence is a non-symmetric measure of the difference 
between two probability distribution functions p and q. Specif¬ 
ically, the KL-divergence of q from p, denoted DKL{p\\q), is a 
measure of the information lost when q is used to approximate 

p: 

DKLip\\q)=[ p{x)ln^^dx. 

4-00 Qix) 


To estimate the KL-divergence between distributions, we use 
the FNN library in R which utilizes the method introduced 
in 1351 for estimating the KL-divergence between two distri¬ 
butions using their samples. 


IV. Generating A Synthetic Network 


in log-log scale is shown in Fig. |5p| The lengths’ statistics 
appear in Table 

The line lengths in Figs. and are the actual lengths of 
the power lines (these lines are not necessarily straight lines 
between two substation). To enable the comparison between 
the length distributions of the lines in the real and generated 
networks, in Section |V] we use the point-to-point Euclidean 
distances to represent the line lengths in the real and the 
generated networks. Table |I^ includes the statistics regarding 
both the actual line lengths and the lengths of the straight 
lines between the substations, in order to demonstrate the 
differences between the metrics. 

In Section |V] we use Kullback-Leibler (KL) divergence 

^As can be seen in Eigs.[^and[^ there are some very short lines (Ri 30m) 
in the considered networks. We cnecked the dataset to verify the credibility 
of these lines and did not find any issues (these lines are categorized as below 
230kV lines). 


In this section, we introduce the Geographical Network 
Learner and Generator (GNLG) Algorithm (Algorithm for 
generating a synthetic network similar to a given network. 
The algorithm uses the Gaussian Mixture Model (GMM) for 
density estimation of the node positions and generates a set of 
nodes with similar spatial distribution to the nodes in a given 
network (the SDNG Procedure described in Subsection |IV-A| i. 
Then, it connects the nodes using two procedures whose 
design principles are inspired by historical evolution of the 
grids (the TWST and Reinforcement procedures described in 
Subsection IV-B 1 . The GNLG Algorithm can be applied to any 
network, where the important part is tuning the parameters to 
a given network. In the following subsections, we describe 
the building blocks of the GNLG Algorithm and use the WI 
to demonstrate the algorithm design and operation. Then, in 
Section [V] we evaluate the algorithm using the WI, SERC, 
and FRCC grids. 
























































5 


Procedure 1: Spatially Distributed Nodes Generator 
(SDNG) 

Input: G, 

1: Fit a GMM model to to cluster them into c clusters that 

maximizes the BIC. 

2: For all i = 1,..., n sample Zi from the categorical probability 
distribution tt obtained from GMM. 

3: For all i sample p' from the probability distribution ) 

obtained from GMM. 

4: return {p'}^^^. 


A. Node positions 

We now introduce the Spatially Distributed Nodes Genera¬ 
tor (SDNG) Procedure (Procedure [TJ for generating a set of 
nodes with similar spatial distribution to the nodes in a given 
network. The node positions are correlated with the population 
and geographical properties (e.g., Fig. |^. Thus, the nodes can 
be clustered into groups based on their geographical proximity. 
Mixture models and in particular Gaussian Mixture Models 
(GMM) are commonly used for clustering and density esti¬ 
mation 136) . Hence, the SDNG Procedure uses the GMM for 
clustering the positions and uses BIC to find the best number of 
clusters (c). It obtains the mean and covariance matrix (pj, T,j) 
of the points in clusters j = 1,..., c along with the categorical 
probability of the clusters tt = (tti, ..., tTc). Then, it uses these 
parameters to generate n nodes with similar spatial distribution 
as the nodes in a given network. 

For implementing the SDNG Procedure, we used the 
mclust library in R p7) to apply GMM to our dataset. This 
library uses the Expectation Maximization (EM) algorithm to 
fit a GMM and provides the Bayesian Information Criterion 
(BIC) for the selected number of clusters. Clustering the nodes 
in the WI into 55 clusters results in the maximum BIC. Hence, 
the SDNG Algorithm clusters WI into c = 55 clusters. As 
can be seen in Eig. the distribution of the generated nodes 
appears very similar to the distribution of the nodes in the WI. 

Notice that for a given network, step 1 in the Procedure 
should be executed only once. Then, having the fitted GMM 
parameters, the procedure can be used to generate several 
instances of nodes with similar spatial distribution to the nodes 
in the given network. Hence, once the parameters are available, 
synthetic grids can be generated with no need to access the 
real grid data. 

B. Connections between the nodes 

We introduce two procedures (steps 2 and 3 in the GNLG 
Algorithm) for connecting the generated nodes. Their design 
is inspired by the historical evolution of power grids. The 
two main design consideration of the grid are (i) connectivity 
and (ii) robustness. Therefore, we first present the Tunable 
Weight Spanning Tree (TWST) Procedure for finding a span¬ 
ning tree and to ensure connectivity. We then describe the 
Reinforcement Procedure for adding more edges and ensuring 
the network robustness as well as for tuning the structural 
properties of the synthetic network to resemble those of a 
given network. 

1) Connectivity: In order for the power grid to operate, 
the substations (nodes) should be connected. Due to con¬ 
struction costs, in the real world new substations are usually 



Pig. 6: An example of clustering the nodes in the WI into 10 
clusters using GMM. 



Pig. 7: A set of nodes, that were generated using the SDNG 
Procedure, with a similar spatial distribution to the nodes in 
the WI. 

connected to the nearest substation in the existing grid. Since 
the power grids have evolved gradually and locally, they do 
not necessarily contain the Minimum weight Spanning Tree 
(MST) of the nodes in the plane (the weight of a spanning 
tree T = {Vt,Et) is the sum of the edge lengths in T: 
Wt = J2{i,3}eET IIP» “ PflD- Hence, we do not focus on 
finding the MST. Instead, we present the TWST Procedure 
(Procedure [^, which imitates the the gradual grid evolution. 
It is a low complexity procedure for finding a spanning tree 
with a tunable weight. 

The procedure uses the average node location, denoted by; 
p' = orders the nodes in n rounds (see 

step|^ to obtain a permutation of indices cr:{l,2,...,n}—>' 
{1, 2,..., n}. At round i, it samples a node j from the nodes 
that were not already sampled with probability proportional to 
Up' — p'|l“”, where k is a parameter. It then sets a{i) ^ j. 
In step 1^ it connects each node a{i) to its nearest neighbor 
such that j* < 

The procedure results in a tree whose weight highly depends 
on the ordering of the nodes, and thereby on k. Moreover, there 
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(a) (b) 

Fig. 8: (a) The weight of the spanning tree (in IQ^km) obtained 
by the TWST Procedure on the nodes shown in Fig. vs. k. 
Each point is the average over 10 generated trees. The blue 
dash-dot line shows the weight of the MST and the red dashed 
line shows the weight of the obtained spanning tree for k = oo. 
(b) The average path length in the spanning tree obtained by 
the TWST Procedure on the nodes shown in Fig.vs. k. Each 
point is the average over 10 generated trees. The average path 
length in a specific MST (an MST may not be unique) is 
520. The red dashed line shows the average path length in the 
obtained spanning tree for k = oo. 


Procedure 2: Tunable Weight Spanning Tree (TWST) 

Input: n, and parameter k. 

1: A = {1,..., n}, a is an empty array of size n. 

2 : for i = 1 ..., n do 

3: Sample a node from A such that the probability of sampling 


node j is 
j{i) 


EaeA llPa-P'll"’' ■ 

j, A ^ A\{j}. 

for 2 = 2,... , n do 

Connect node <7(2) to node such that 

= argmin^-<Jip;(.) - p(. 




is a specific ordering of the nodes such that the procedure 
provides the MST (the nodes should be ordered according 
to their appearance in Prim’s Algorithm p8) for finding the 
MST). Specifically, k determines the difference between the 
obtained spanning tree and the MST. Pig. [^a) shows the 
relationship between the weight of the obtained tree and k. 
When K = 0, the nodes are ordered randomly and the weight 
of the obtained spanning tree significantly differs from the 
MST’s weight. However, As k increases the weight of the 
spanning tree decreases. When k is very large, the nodes are 
ordered based on their distance from the average location, and 
therefore, the obtained spanning tree’s weight is close to the 
MST’s (shown by the blue dash-dot line). 

Pig. [^b) shows the relationship between k and the average 
path length in the obtained tree. As k increases, the average 
path length increases. Por large k, this increase is more 
significant. Moreover, the average path length in an MST 
(520) is significantly larger than in trees obtained by the 
TWST Procedure. Overall, Pigs.[^a),(b) suggest that selecting 
a relatively small k results in a spanning tree with smaller 
average path length than the MST and with a reasonable total 
weight. We show in Section |V] that for generating a network 
similar to the WI, k = 2.5 is a relatively good choice. 

2) Robustness: We present the Reinforcement Procedure 
whose objective is to increase the robustness of the generated 



Pig. 9; The relationship between the degree of a node and its 
average p with N = 10, for the nodes in the WI (the red line 
is the linear regression fit to the data points). 


Procedure 3: Reinforcement _ 

Input: n, m, and parameters a, /3, 7 , r; > 0, S N. 

1: For each node i, compute pi (the average distance of node i from 
its N nearest neighbors). 

2: for count = ltom — n-|-ldo 

3: if large network'. From all nodes with degree less than 3, 

sample node i with probability oc p'~'^ ■ 

4: if small network'. Sample node i with probability oc p'~°‘. 

5: Connect node i to node j sampled from all other nodes with 

probability oc ||p' — p'j\\~^d'y. 


network and adjust its properties (e.g., L and C) to resemble 
those of a given network. The procedure is based on three 
observations; (i) the degree distributions of power grids are 
very similar to those of scale-free networks, but grids have 
less degree 1 and 2 nodes and do not have very high degree 
nodes (e.g., Pig.[^, (ii) it is inefficient and unsafe for the power 
grids to include very long lines (e.g.. Pigs. and |^, and (iii) 
nodes in denser areas are more likely to have higher degrees. 
The last observation is demonstrated by Pig. where as the 
degree increases, the p decrease^ (i.e., the density around a 
node increases). 

The Reinforcement Procedure aims to create a network 
whose properties are similar to those observed above. Hence, 
it repeats the following steps m — n-\-l times: (1) selects a low 
degree node in a dense area (observations (i) and (iii)), and 
(2) connects it to a high degree node (as in the preferential 
attachment model pS] ) which is also nearby (distance was not 
considered in GD) (observations (i) and (ii)). 

To select a low degree node in a dense area, the Reinforce¬ 
ment Procedure samples a node i with probability cx d~^p~°‘. 
However, as can be seen in Pig. the distribution of the 
degree 1 and 2 nodes is almost equal in the WI and SERC 
grids. Hence, for large networks, the procedure only considers 
degree 1 and 2 nodes and select a node among them with 
probability oc p, “. a and 77 are the tunable parameters. 

To connect the node sampled in the previous step to a high 
degree but nearby node, in the second step, the Reinforcement 
Procedure connects node i to node j sampled from all other 
nodes with probability oc ||p' — p'This implies that 

^Recall that p is the average Euclidean distance of a node from its N 
nearest neighbors. 
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generated based on the WI grid using the GNLG Algorithm 
with K = 2.5, Of = 1, /3 = 3.2 ,7 = 2.5, and N = 10. 

node i preferentially connects to a high-degree node, unless 
the high-degree node is too far in which case it is desirable to 
connect to a low-degree but nearby node. This is very similar 
to the model introduced in pT[ , p^ . However, here we only 
use these probabilities for sampling and do not use them for 
connecting every pair of nodes. 

We note that /3 determines the length distribution of the 
new lines and 7 determines the likelihood of the existence 
of high degree nodes. If (3 is large compared to 7 , then 
new edges connect nearby nodes, thereby resulting in a large 
clustering coefficient and a large average path length. If 7 is 
large compared to /3, then new edges connect nodes to high 
degree nodes regardless of their distance, thereby resulting in 
very high degree nodes and long edges. Hence, there should be 
a balance between the /3 and 7 values. We show in Section [V| 
that for generating a network similar to the WI, /3 = 3.2 and 
7 = 2.5 are relatively good choices. 

V. Evaluation 

In this section, we use the GNLG Algorithm to generate 
networks similar to the WI, SERC, and ERCC grids. We 
evaluate the structural properties of the obtained networks and 
show that they have similar properties to the real networks. 


A. WI 


As mentioned in Section IV-B the parameters /t, a, /3, 7 , N 
can be used to tune the structural properties of the obtained 
network. Therefore, we conducted several numerical experi¬ 
ments in which the parameters were adapted and the structural 
properties were evaluated. We observed empirically that the 
following parameters values provide a network with similar 
properties to the WI; k = 2.5, a = l,/3 = 3 . 2,7 = 2.5, and 
N = 10. Moreover, as mentioned in Section |IV-A| BIG was 
used to determine the number of clusters (c = 55). 

The nodes generated by the SDNG Procedure were shown 
in Eig. 1^ The network obtained by the GNLG Algorithm 
appears in Eig. 10 and visually resembles the WI. To study the 



(a) Gwi (b) G'-^i 

Eig. 11 : The degree distribution of the nodes in Gwi 
G'y/i (in log-log scale). Linear regression lines with slopes 
C, = —3.48 and = —3.99 are fitted to the distributions 
of the nodes with degree greater that 2 in Gwi and G'-^j, 
respectively. The KS statistic between the degree distributions 
is 0.047. 




Log length Log length 

(a) Gwi (b) G'wi 

Eig. 12; The length (in km) distribution of the point-to- 
point lines in Gwi and G'wi and nonparametric distribution 
fit (shown in blue). The KL-divergence between the length 
distributions in Gwi and G'wi 0.14. 


structural similarity between the obtained network G'wi and 
the Gwi, we evaluated G'y^j based on the metrics described 
in Section m The clustering coefficient and the average 
path length of G'wi are G' = 0.052 and L' = 17.40, 
respectively, and are very close to those of Gwi (G = 0.049 
and L = 17.33). 


Eig. 11 shows the degree distribution of the nodes in G'^ 


WI- 

As can be seen, the slope of the fitted regression line to the tail 
of the distribution is —3.99 which is similar to that of Gwi 
(—3.4). Moreover, the KS statistic between the cumulative 
degree distributions in Gwi and G'wi 0.047, indicating 
the similarity between the degree distributions. Eig. 12 shows 
the length distribution of the lines in G'yyj. Since the GNLG 
Algorithm uses straight lines to connect the nodes, we compare 
the length distribution of the lines in G'wi with the length 
distribution of the straight point-to-point lines in Gwi- The 
KL-divergence between the length distributions of the lines in 
Gwi and G'^^^ is Dkl = 0.14, indicating that distributions 
are similar. 


Table |I^ summarizes the structural properties of the Gwi 
and five instances generated by the GNLG Algorithm. The 
results indicate that the Algorithm can generate synthetic 
networks with similar structural properties to the WI grid. 





























TABLE III: Comparison between the structural properties of 
WI (Gwi) and the Generated WI (G'yyj). Five instances of 
G'y^^j are shown to illustrate that the metric values are similar. 
All networks have 14,302 nodes and 18,769 edges. 


Networks 

L 

C 

c 

Dks 

Dkl 

Gwi 

17.33 

0.049 

-3.48 

0 

0 

^'wT 

17.40 

0.052 

-3.99 

0.047 

0.14 


18.36 

0.052 

-3.65 

0.050 

0.15 


18.36 

0.049 

-3.99 

0.047 

0.12 

gLtG) 

19.06 

0.052 

-3.61 

0.049 

0.14 

GLr{5} 

17.79 

0.051 

-3.50 

0.049 

0.14 



Fig. 13: A part of the Eastern Interconnection (FI) with 12,946 
substations (nodes) and 16,658 lines (edges) that operates 
under the SERC. 


B. SERC 


We apply the GNLG Algorithm to part of the FI that 
operates under the SERC (see Fig. [T3]i that has 13,602 


substations (nodes) and 17,767 lines (edges) . Fig. 14 shows 
the obtained network using the GNLG Algorithm with k = 3, 
a = 0.5,/3 = 3 . 2,7 = 2.5, and N = 5 that are selected 
empirically following several numerical experiments. In the 
SDNG Procedure, SERC has been clustered into c = 50 
clusters based on the BIC. 

The comparison between the degree distribution of the 
nodes and the length distributions of the lines in Gserc ™d 

and 




SERC 


are shown in Figs. 15 


16 


Table 


IV 


summarizes 


the structural properties of Gserc ™d five instances gen¬ 
erated by the GNLG Algorithm. As with the WI, it can be 
seen that the Algorithm can generate synthetic networks with 
similar structural properties to the SERC grid. 


C. FRCC 

Finally, we apply the GNLG Algorithm to a smaller part of 
the FI with 1,312 substations (nodes) and 1,780 lines (edges) 
that operates under the FRCC (see Fig. 171. As can be seen 
in Fig. 18 the degree distribution of the nodes in Gfrcc is 
different from the degree distribution of the nodes in Gwi 
and Gserc- In Gfrcc^ only the density of the nodes with 
degree 1 is not on the fitted regression line. This suggests that 
in the Reinforcement Procedure, the step for small networks 



Fig. 14: A network with 12,946 nodes and 16,658 edges 
generated based on the SERC grid using the GNLG Algorithm 
with K = 3, a = 0.5, f3 = 3.2 ,7 = 2.5, and N = 5. 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 

Log degree 

(a) Gserc 



1.0 1.5 2.0 

Log degree 


(b) 


SERC 


Fig. 15: The degree distribution of the nodes in Gserc nnd 
G'gERc (in log-log scale). Linear regression lines with slopes 
C = —3.93 and Q = —4.12 are fitted to the distribution of 
the nodes with degree greater that 2 in Gserc G'g^j^Q, 
respectively. The KS statistic between the degree distributions 
is 0.047. 

should be used and nodes should be sampled with probability 
cx Here, we use p = 2. 

Fig. [it] shows the obtained network using the GNLG Al¬ 
gorithm with K = 1.8, a = 0.5, j3 = 2.5 ,7 = 2.8, and N = 5 
that were selected empirically. Nodes in the FRCC has been 
clustered into c = 15 clusters. The comparison between the 
degree distributions of the nodes and length distributions of the 
lines between Gfrcc in Gpj^^c shown in Figs. 18 
and 19 Table [V] summarizes the structural properties of the 
FRCC and five instances generated by the GNLG Algorithm. 
The results suggest that the GNLG algorithm can generate 
smaller networks as well. 


VI. Conclusions 

In this paper, we developed the GNLG Algorithm for gen¬ 
erating synthetic power grid networks with similar structural 
properties to a given network. We applied the algorithm to the 
WI and two parts of the FI (SERC and FRCC) and showed 
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Log length Log length 


(a) Gserc 


( b ) G'serc 




Log degree 


Log degree 


Fig. 16: The length (in km) distribution of the point-to-point 
lines in Gserc and Gsbac ™d nonparametric distribution 
fit (shown in blue). The KL-divergence between the length 
distribution of the lines in Gserc and G'q^j^q is 0.081. 

TABLE IV: Comparison between the structural properties of 
the SERC (Gserc) and the Generated SERC (G'^^j^q). Eive 
instances are shown to illustrate that the metric values are 
similar. All networks have 12,946 nodes and 16,658 edges. 


Networks 

L 

C 

c 

Dks 

Dkl 

Gserc 

19.71 

0.049 

-3.93 

0 

0 

r^' 

20.26 

0.048 

-4.12 

0.047 

0.081 

gKefcX'^) 

19.43 

0.045 

-4.25 

0.044 

0.077 

G^ffcX^) 

17.56 

0.048 

-4.72 

0.044 

0.084 

GsERci'^) 

17.95 

0.047 

-4.46 

0.048 

0.083 

Gserc(^) 

19.87 

0.049 

-4.5 

0.046 

0.080 



(a) Gfrcc 



Eig. 17: (a) Part of the Eastern Interconnection (El) with 1,312 
substations (nodes) and 1,780 lines (edges) that operates under 
the ERCC. (b) A network with the same number of nodes 
and edges that is generated using the GNLG Algorithm with 
K = 1.8, a = 0.5, /3 = 2.5 ,7 = 2.8, and N = 5. 

that it can generate networks with similar structural properties 
to these networks. In a broader perspective, the algorithm 
can be used for anonymizing network data that cannot be 
published otherwise, thereby enabling research in power grid 
vulnerability and resilience. 

This is only a first step towards generation of synthetic 
power grid networks and there are clearly several future 
research directions. Specifically, for a given network, step 1 
of the GNLG Algorithm and tuning the parameters need to be 
done only once. Then, the algorithm can be used to generate 
several networks similar to a given network. Hence, we plan to 
provide a web application that would allow obtaining synthetic 
networks similar to a given reliability regions in the Northern 
American power grid with specific set of parameters (e.g.. 


(a) Gfrcc (b) G'frcc 

Eig. 18: The degree distribution of the nodes in Gfrcc 
G'frcc log-log scale). Linear regression lines with slopes 
^ = —2.76 and (( = —2.40 are fitted to the distribution of 
the nodes with degree greater that 1 in Gfrcc and G'pucc^ 
respectively. The KS statistic between the degree distributions 
is 0.032. 




(a) Gfrcc (b) G'frcc 

Eig. 19: The length (in km) distribution of the point-to-point 
lines in Gfrcc and Gpucc nonparametric distribution 
fit (shown in blue). The KL-divergence between the length 
distributions in Gfrcc and G'p^cc 0 . 12 . 

TABLE V: Comparison between the structural properties of 
the ERCC (Gfrcc) and the Generated ERCC (G'ffcc^ ^1''® 
instances are shown to illustrate that the metric values are 
similar. All networks have 1,312 nodes and 1,780 edges. 


Networks 

L 

C 

c 

Dks 

Dkl 

Gfrcc 

11.68 

0.075 

-2.76 

0 

0 

r^i 

10.81 

0.045 

-2.40 

0.032 

0.12 

Ge-rccC^) 

11.86 

0.057 

-2.70 

0.025 

0.12 

Gfrcc(^) 

11.13 

0.053 

-2.78 

0.022 

0.10 

*^FRf7n(4) 

11.27 

0.051 

-2.86 

0.025 

0.13 

GpRCci^) 

11.66 

0.057 

-2.36 

0.015 

0.12 


currently it takes less than 3.5 minutes for our server to 
generate a synthetic network similar to the WI). Moreover, 
we plan to improve the algorithm and to focus on locations of 
power generators and demand nodes as well as on generation 
and demand values. Generation of topologies where the line 
voltages are taken into account is also an interesting open 
problem. Einally, we believe that the approach can be extended 
for generating various types of spatially distributed networks. 
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